Skip to content

http2: reduce per-request overhead on the server path#64265

Open
mcollina wants to merge 6 commits into
nodejs:mainfrom
mcollina:http2-perf-r2
Open

http2: reduce per-request overhead on the server path#64265
mcollina wants to merge 6 commits into
nodejs:mainfrom
mcollina:http2-perf-r2

Conversation

@mcollina

@mcollina mcollina commented Jul 2, 2026

Copy link
Copy Markdown
Member

This PR cuts a number of per-request/per-stream costs on the HTTP/2 server hot path, in two commits.

1. http2: reduce per-request allocations

  • Track 'priority'/'frameError' stream listeners by overriding the EventEmitter methods on Http2Stream instead of subscribing to 'newListener'/'removeListener'. The previous approach made every listener add/remove on every stream emit an extra tracking event (the compat layer alone adds 11 listeners per request).
  • buildNgHeaderString: replace the per-call SafeSet with a lazily allocated array, skip the sensitive-headers map() when there are none (the common case), and skip the HTTP-token regex plus connection-specific-header checks for well-known single-value header names — they are all valid tokens and none of them is connection-specific.
  • Replace per-call closures with shared named handlers in onStreamClose (natural close path), afterShutdown and Http2Stream._destroy.
  • Skip the pendingStreams Set add/delete for streams created with their native handle already available (all server streams).
  • Hoist the per-request onStreamTimeout closure factories in the compat layer, and avoid a once() wrapper allocation per server stream.

2. http2: skip trailers round trip for compat responses

The compat layer always responded with waitForTrailers set, so every response paid for a wantTrailers C++ → JS callback, an empty sendTrailers() submission scheduled through setImmediate(), and an extra empty DATA frame on the wire — even though the vast majority of responses never register trailers.

When the headers are flushed as part of response.end() and no trailers have been registered, there is no further opportunity to add trailers, so waitForTrailers is now skipped. Headers flushed early (writeHead(), write(), flushHeaders()) keep the previous behavior, so trailers can still be added while streaming.

Behavior note for reviewers: trailers added after response.end() are now silently dropped. This matches HTTP/1 response.addTrailers() semantics (docs updated accordingly).

3. http2: avoid per-write closures in kWriteGeneric

Every _write()/_writev() allocated four closures plus an anonymous nextTick callback to coordinate the write callback with the end-of-stream check. Since the stream machinery dispatches at most one write at a time, that state now lives on the stream's kState object with shared named functions. When trailers are pending, the end-of-stream check tick is skipped entirely (the writable side cannot be shut down early anyway). Also pre-initializes the dynamically-added kState fields (shutdownWritableCalled, fd) so hot-path stores no longer transition the object shape.

4. http2: finish empty trailers natively for compat streams

Compat responses that flush headers before end() (writeHead()/write()/flushHeaders()) must keep waitForTrailers, so they paid a wantTrailers C++ → JS callback, an empty sendTrailers() + setImmediate(), and a trailers() call back into C++ on every response. A new internal STREAM_OPTION_AUTO_EMPTY_TRAILERS lets C++ finish the stream itself (same empty DATA + END_STREAM frame, identical wire format) when JS never registered trailers; a later setTrailer() flips the stream back to JS-managed trailers via a new disableAutoTrailers() binding, so streaming trailers work unchanged (regression test added). Compat writeHead()+end(): +5.0% vs the previous commit (47.8k → 50.2k req/s, 8 alternating runs); multi-write streaming ~+1%.

5. http2: reduce scheduled callbacks per request

The end-of-stream check (which merges END_STREAM into the final DATA frame) was a process.nextTick() on every write; when the write is dispatched from inside end() — the common end(chunk) case — an end() override now runs the check synchronously after the base method returns. And the setImmediate() scheduled on every stream destruction to poke Http2Session[kMaybeDestroy] is now gated on the only condition where it isn't a no-op (session closed, no remaining streams); session.close() and the native ongracefulclosecomplete notification cover the other paths. Wire format verified byte-identical. Consistently ~+1% across 42 paired samples (within single-run noise).

6. http2: avoid copying the options in respond()

Drops the per-response { ...options } clone (respond only reads the options now) and picks up :status/date while copying the response headers instead of re-reading them from the dictionary-mode copy. Throughput-neutral on its own; removes an object clone and several dictionary lookups per response.

Negative results from the megamorphic-IC investigation (for the record)

--log-ic tracing under load shows the dominant megamorphic sites are in the events machinery (_events/_eventsCount loads and events[type] keyed loads across heterogeneous emitter shapes) — a node-wide property of EventEmitter, not addressable from http2. The header-object keyed stores/loads (toHeaderObject, header copies) are inherently megamorphic: a single keyed-store site writing several different keys always goes megamorphic regardless of repeating shapes. Replacing ObjectKeys() + keyed loads with for-in in buildNgHeaderString was tried and regressed header-heavy workloads (−1.7% at nheaders=1000, 99.9% confidence) — large null-prototype copies are dictionary-mode, where for-in has no enum-cache fast path — so it was reverted.

Benchmarks

h2load (-c 4 -m 100, 1 KiB payload, mean of 6 alternating runs):

server main this PR Δ
core API (stream.respond + end) 61.0k req/s 70.7k req/s +15.9%
compat API (res.setHeader + end) 43.7k req/s 50.4k req/s +15.3%

benchmark/compare.js (10 runs):

http2/headers.js nheaders=0 n=1000            **      3.98 %       ±2.79%
http2/headers.js nheaders=10 n=1000          ***      7.11 %       ±3.32%
http2/headers.js nheaders=100 n=1000                  1.67 %       ±1.73%
http2/headers.js nheaders=1000 n=1000                -0.32 %       ±1.26%
http2/compat.js duration=5 benchmarker='h2load' clients=2 streams=100 requests=5000    3.02 %  ±4.95%
http2/write.js  duration=5 benchmarker='h2load' size=100000 length=131072 streams=100  1.66 %  ±8.59%

(compat.js/write.js/simple.js stream a file from fs per request, so they are dominated by file streaming and mostly insensitive to per-request overhead; no regressions.)

mcollina added 2 commits July 2, 2026 19:43
Cut several sources of per-stream/per-request overhead on the hot
path:

- Track 'priority'/'frameError' stream listeners by overriding the
  EventEmitter methods on Http2Stream instead of subscribing to
  'newListener'/'removeListener', which made every listener add and
  remove on every stream emit an extra tracking event.
- Replace the per-call SafeSet and sensitive-header mapping in
  buildNgHeaderString with a lazily allocated array and an
  empty-array fast path, and skip the HTTP token regex and
  connection-specific header checks for well-known single-value
  header names.
- Replace per-call closures with shared named handlers in
  onStreamClose, afterShutdown and Http2Stream._destroy.
- Skip the pendingStreams Set add/delete for streams that are
  created with their native handle already available (all server
  streams).
- Hoist the per-request onStreamTimeout closure factories in the
  compat layer to module-level handlers, and avoid a once() wrapper
  allocation per server stream.

h2load, 1 KiB response payload, -c 4 -m 100, mean of 6 alternating
runs: core API 60.2k -> 69.3k req/s (+15%), compat API 43.6k ->
46.2k req/s (+5.9%).

Signed-off-by: Matteo Collina <hello@matteocollina.com>
The compat layer always responded with waitForTrailers set, so every
response paid for a wantTrailers C++ -> JS callback, an empty
sendTrailers() submission scheduled through setImmediate(), and an
extra empty DATA frame on the wire, even though the vast majority of
responses never register any trailers.

When the headers are flushed as part of response.end() and no
trailers have been registered, there is no further opportunity to
add trailers, so waitForTrailers can be skipped altogether. Headers
flushed early (writeHead, write, flushHeaders) keep the previous
behavior so trailers can still be added while streaming.

Trailers added after response.end() are now silently dropped,
matching the HTTP/1 response.addTrailers() semantics.

Also reuse a shared options object for Http2ServerRequest instances
created without explicit options.

h2load, 1 KiB response payload, -c 4 -m 100, mean of 6 alternating
runs: compat API 43.1k -> 49.9k req/s (+15.7% cumulative vs main).

Signed-off-by: Matteo Collina <hello@matteocollina.com>
@nodejs-github-bot

Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/http
  • @nodejs/http2
  • @nodejs/net

@nodejs-github-bot nodejs-github-bot added http2 Issues or PRs related to the http2 subsystem. needs-ci PRs that need a full CI run. labels Jul 2, 2026
mcollina added 4 commits July 2, 2026 20:52
Every _write()/_writev() on an Http2Stream allocated four closures
and an anonymous nextTick callback to coordinate the write callback
with the end-of-stream check. Since the stream machinery dispatches
at most one write at a time, that coordination state can live on the
stream's kState object instead, with shared named functions for the
end check and completion logic.

When trailers are pending the writable side cannot be shut down
early anyway, so the end-of-stream check tick is now skipped
entirely for those writes.

Also pre-initialize the kState fields that used to be added
dynamically (shutdownWritableCalled, fd) so hot-path stores no
longer transition the object shape.

h2load, 1 KiB response payload, -c 4 -m 100, mean of 6 alternating
runs vs main: core API 61.0k -> 70.7k req/s (+15.9% cumulative),
compat API 43.7k -> 50.4k req/s (+15.3% cumulative).

Signed-off-by: Matteo Collina <hello@matteocollina.com>
When the compat layer flushes response headers before the response
is ended (writeHead(), write(), flushHeaders()), it must keep
waitForTrailers so that trailers can still be added while streaming.
As a result, every such response paid for a wantTrailers C++ -> JS
callback, an empty sendTrailers() with its setImmediate(), and a
trailers() call back into C++, even though most responses never
register any trailers.

Introduce STREAM_OPTION_AUTO_EMPTY_TRAILERS: when set and no
trailers have been handed to the native side by the time the final
DATA frame is sent, the stream is finished directly in C++ with the
same empty DATA frame carrying END_STREAM that the JS path would
have produced, without calling into JS at all. The compat layer
enables this mode whenever it responds with waitForTrailers and no
trailers registered yet; a later setTrailer() call flips the stream
back to JS-managed trailers through a new disableAutoTrailers()
binding, so streaming trailers keep working unchanged.

The wire format is identical in all cases.

h2load -c 4 -m 100, 1 KiB payload, mean of 8 alternating runs
against the previous commit: compat writeHead()+end() 47.8k -> 50.2k
req/s (+5.0%); multi-write streaming responses +1%.

Signed-off-by: Matteo Collina <hello@matteocollina.com>
Two per-request scheduling eliminations:

- The end-of-stream check that lets the final DATA frame carry the
  END_STREAM flag was scheduled with process.nextTick() on every
  write. When the write is dispatched from inside end() - the common
  case of end(chunk) - the check can instead run synchronously once
  end() returns and the writable state has settled. An end()
  override marks the stream while the base method runs, and
  [kWriteGeneric] hands the check back to it instead of scheduling
  a tick. Writes not tied to end() keep the nextTick behavior.

- Every stream destruction scheduled a setImmediate() to ask the
  session to clean itself up, but Http2Session[kMaybeDestroy] is a
  no-op unless the session is closed and has no remaining streams.
  Gate the setImmediate() on that condition: session.close() runs
  its own check, and the native side notifies again through
  ongracefulclosecomplete once pending data is flushed.

The wire format is unchanged (verified byte-identical h2load
traffic), and the END_STREAM merge is preserved.

h2load -c 4 -m 100, 1 KiB payload, alternating runs vs the previous
commit: consistently around +1% (within run-to-run noise on any
single set, positive across 42 paired samples).

Signed-off-by: Matteo Collina <hello@matteocollina.com>
respond() copied the user-provided options object on every call just
so it could normalize and locally flip options.endStream, and
prepareResponseHeadersObject() then looked the :status and date
fields up again on the dictionary-mode null-prototype headers copy
it had just built. Use a local variable for endStream and pick up
:status/date while copying the headers instead.

No measurable throughput change on its own; this removes an object
clone and several dictionary-mode property lookups per response.

Signed-off-by: Matteo Collina <hello@matteocollina.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

http2 Issues or PRs related to the http2 subsystem. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants